Distance Based Generalisation
نویسندگان
چکیده
Many distance-based methods in machine learning are able to identify similar cases or prototypes from which decisions can be made. The explanation given is usually based on expressions such as ”because case a is similar to case b”. However, a more general or meaningful pattern, such as ”because case a has properties x and y (as b has)” is usually more difficult to find. Even in this case, the connection of this pattern with the original distance-based method is generally unclear, or even inconsistent. In this paper, we study the connection between the concept of distance (or similarity) and the concept of generalisation. More precisely, we define several conditions a sensible distance-based generalisation must have. From here, we are able to tell whether a generalisation operator for a pattern representation language is consistent with the metric space defined by the underlying distance. We show sensible pattern languages and generalisation operators for typical data types: nominal, numerical, sets and lists. We also explore a possible relationship between the wellknown concepts of lgg and distances between terms, and the definition of generalisation presented in this paper. keywords: distance-based methods, generalisation operators, lgg, metric space.
منابع مشابه
Bridging the Gap between Distance and Generalization
Distance-based and generalisation-based methods are two families of artificial intelligence techniques that have been successfully used over a wide range of real-world problems. In the first case, general algorithms can be applied to any data representation by just changing the distance. The metric space sets the search and learning space, which is generally instance-oriented. In the second cas...
متن کاملGeneralisation Operators for Lists Embedded in a Metric Space
In some application areas, similarities and distances are used to calculate how similar two objects are in order to use these measurements to find related objects, to cluster a set of objects, to make classifications or to perform an approximate search guided by the distance. In many other application areas, we require patterns to describe similarities in the data. These patterns are usually co...
متن کاملAn Instantiation for Sequences of Hierarchical Distance-based Conceptual Clustering
In this work, we present an instantiation of our framework for Hierarchical Distance-based Conceptual Clustering (HDCC) using sequences, a particular kind of structured data. We analyse the relationship between distances and generalisation operators for sequences in the context of HDCC. HDCC is a general approach to conceptual clustering that extends the traditional algorithm for hierarchical c...
متن کاملMinimal Distance-Based Generalisation Operators for First-Order Objects
Distance-based methods have been a successful family of machine learning techniques since the inception of the discipline. Basically, the classification or clustering of a new individual is determined by the distance to one or more prototypes. From a comprehensibility point of view, this is not especially problematic in propositional learning where prototypes can be regarded as a good generalis...
متن کاملBinarising SIFT-Descriptors to Reduce the Curse of Dimensionality in Histogram-Based Object Recognition
It is shown that distance computations between SIFT-descriptors using the Euclidean distance suffer from the curse of dimensionality. The search for exact matches is less affected than the generalisation of image patterns, e.g. by clustering methods. Experimental results indicate that for the case of generalisation, the Hamming distance on binarised SIFTdescriptors is a much better choice. It i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005